Rethinking simultaneous suppression in visual cortex via compressive spatiotemporal population receptive fields

When multiple visual stimuli are presented simultaneously in the receptive field, the neural response is suppressed compared to presenting the same stimuli sequentially. The prevailing hypothesis suggests that this suppression is due to competition among multiple stimuli for limited resources within receptive fields, governed by task demands. However, it is unknown how stimulus-driven computations may give rise to simultaneous suppression. Using fMRI, we find simultaneous suppression in single voxels, which varies with both stimulus size and timing, and progressively increases up the visual hierarchy. Using population receptive field (pRF) models, we find that compressive spatiotemporal summation rather than compressive spatial summation predicts simultaneous suppression, and that increased simultaneous suppression is linked to larger pRF sizes and stronger compressive nonlinearities. These results necessitate a rethinking of simultaneous suppression as the outcome of stimulus-driven compressive spatiotemporal computations within pRFs, and open new opportunities to study visual processing capacity across space and time.


Field-specific reporting
Please select the one below that is the best fit for your research.If you are not sure, read the appropriate sections before making your selection.

Life sciences
Behavioural & social sciences Ecological, evolutionary & environmental sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Replication
Minimally preprocessed behavioral, eye, and fMRI data are publicly available at https://osf.io/rpuhs/(data for main paper figures) and https://osf.io/e83az/(data for supplementary figures).Source data are available to create each data figure.In addition, for 7 participants who also participated in Kim et al. (2024), we downloaded pRF parameters (df_cv0_defaultHRF.mat)from the OSF storage page: https://osf.io/3gwhz/files/osfstorage.
A roughly equal number of male and female subjects were recruited for participation in this research (participant's selfassigned sex: 6 female, 4 male).The investigators do not expect significant gender differences, thus no sex-or gender-based analyses were performed.
There was no exclusion in this study based on race or ethnicity, and investigators do not expect significant race or ethnicicty differences.
No covariate-related population characteristics of the human participants were used in the experimental design or analysis.
Participants were recruited from the Stanford University community and participated in two separate fMRI scanning sessions: one retinotopy with whole brain anatomy scanning session and one simultaneous-vs-sequential (SEQ-SIM) visual paradigm scanning session.We do not expect biases in participant recruitment to meaningfully impact these results.Several participants were fMRI researchers (including all three authors) within the Stanford Psychology Department, which may have increased the data quality (insofar as it is dependent on participant motion and alertness) relative to a random population sample.Only authors KGS and ERK were aware of the specific hypotheses, we believe this awareness did not affect the results, as behavioral and eye fixation performance did not vary across conditions and was within range of other participants performance.
All study procedures were approved by the Stanford Internal Review Board on Human Subjects Research.
Population receptive fields (pRFs) are modeled within each participant and voxel independently.The sample size (number of participants) was determined from previous studies employing similar pRFs methodologies in the visual system which use data from a range of 5-28 participants (5 -Klein, Harvey, & Dumoulin, 2014 Neuron;6 -Dumoulin & Wandell 2008 NeuroImage;6 -Zhou et al. 2018 J Neurosci;5 -Kay et al. 2013 J Neurophys;12 -Stigliani et al. 2017 PNAS;13 -Poltoraski et al. 2021 Nature Comms;28 -Finzi et al. 2020 Nature Comms).Here we collected data from 11 participants.One participant was excluded due to excessive motion (see below) and we report data from 10 participants.The number of voxels per participant per area is specific to (i) the size of their independently-defined cortical visual areas, determined empirically from independent retinotopy data, and (2) coverage of these visual areas in corresponding data from the SEQ-SIM experiment.Overall, we obtained data in most participants' visual areas, except 6 participants who had insufficient coverage of IPS0/1 and 2 participants who had insufficient coverage of TO1/2, due to fewer slices in the SEQ-SIM experiment.
One participant's data were excluded from the fMRI experiment due to excessive and abrupt head motion during scans and across scans (> 1 voxel, 2.4mm).For one participant, we could not collect eye gaze data in the SEQ-SIM experiment due to constraints in the mirror setup.From the 9 participants with continuously recorded eye gaze data, 4 participants were excluded due to excessive measurement noise or no data in >80% of time points within runs.Both criteria were established in advance.
Each voxel's data were analyzed independently for each visual area (~5000±1500 voxels per participant).This process was repeated independently in 10 participants.Voxel's data split-half reliability in the SEQ-SIM experiment was high: correlation of 0.70 to 0.96 across visual areas and participants.Voxel's data were fit using split-half run cross-validation procedure, separate for three pRF models (CST, compressive All conditions were tested within-participants, within-voxels, so random allocation into groups was not necessary.
Group allocation was not performed; thus, blinding was not necessary.
Retinotopy: Event-related (2-s per bar position).Sequential-vs-simultaneous: Block design (8s on/stimulus, 12s off/blank -mean luminance gray screen) The main SEQ-SIM experiment had eight ~5.5-minute runs: 4 repeats of 2 runs (except for one participant, which had 3 repeats of 2 runs).The 2 runs contained 16 stimulus blocks, 4 repeats for each of the 8 conditions (2 sequence orders x 2 sizes x 2 timings).Each participant was assigned to a unique set of runs, where block order was pseudo-randomized across the two runs.Stimulus content (cropped squares from colorful cartoons) was updated for each participant's run, block, and trial.The retinotopy experiment contained 4 repeated runs of ~3.4-minutes in which bar stimuli traversed across the visual in a circular aperture.Cartoon images inside the bar changed randomly at 8 Hz.The bar swept in 12 discrete steps, 2-s per bar position, for 4 orientations (0°, 45°, 90°, 135°) and 2 motion directions for each orientation.
Throughout the main SEQ-SIM experiment, participants fixated at the center of the screen, while performing a challenging RSVP letter 1-back detection task at fixation.Throughout the retinotopy experiment, participants fixated on a small colored dot at the center of the screen, while performing a color-change detection task of the dot (red to green, green to red).Performance was monitored via recorded button box presses, and fixation was monitored via eyetracking in the scanner.
Structural MRI: Whole brain.Functional: EPI slice prescriptions were oblique, roughly perpendicular to the calcarine sulcus, acquiring data from occipital, parietal and temporal lobes.Retinotopy experiment had 28 slices, main SEQ-SIM experiment had 14 slices.
Structural whole brain anatomy scans were aligned to the AC-PC line using Participants' functional scans were aligned with the inplane to their whole brain anatomy scan, using a coarse, followed by a fine 3D rigid body alignment (6 DoF) using the alignvolumedata toolbox (https://github.com/cvnlab/alignvolumedata).The first 8 (SEQ-SIM) or 6 (retinotopy) volumes of each functional scan were removed to avoid data with unstable magnetization.
No normalization was applied; all data were analyzed in the native brain space of each participant The data were not normalized.
Between-and within-scan motion correction was applied, as well as high-pass filtering to remove fMRI drift.Voxels with a split-half reliability <10% in the SEQ-SIM experiment, and pRF model goodness-of-fit (R^2) in the retinotopy data <20%.
No volume censoring was done.
SEQ-SIM experiment: Independently for each of the 6 pRF models (3 main, 3 supplementary), we fitted each voxel's predicted time course to the observed time course with split-half cross-validated linear regression (ordinary least squares), resulting in a cross-validated coefficient of determination (cv-R^2) for each voxel.To quantify simultaneous suppression, we fitted a linear mixed model (LMM) to all participant's voxels within a visual area, using a maximum likelihood fitting method.
Retinotopy experiment: The CSS pRF model was fit to each voxel's average time course using 2-stage optimization (coarse grid-fit, followed by fine search-fit).
The LMM predicted the average simultaneous BOLD response of each voxel as a function of the average sequential BOLD response, for each stimulus condition (fixed interaction effect), allowing for a random intercept and slope per participant and stimulus condition (random interaction effect).Differences in LMM regression slopes were tested with a two-way repeated measures ANOVA (factors: visual area and stimulus conditions across participants).If there was a main effect (p<0.05), we used Bonferroni-corrected post-hoc multiple comparison t-tests (two-sided) to evaluate differences between stimulus conditions and visual areas.Differences in pRF model cv-R^2 were tested with a two-way repeated measures ANOVA across voxels of all participants and visual areas (factors: pRF model and visual area).If there was a main effect (p<0.05), we used Bonferroni-corrected post-hoc multiple comparison t-tests (two-sided) to evaluate differences between pRF models and visual areas.
Pearson's correlation r was used to quantify the relationship between participant slopes averaged across conditions and effective pRF size, exponent, time constant, or semi-saturation constant across visual areas.
Whole brain in the independent fMRI retinotopy data: All voxels' spatial pRF parameters were used to create visual field maps in the native brain space of each subject.These maps were used to draw borders that define visual areas using guidelines from the literature (V1/V2 ROI-based in the main SEQ-SIM experiment: For each subject and visual area, we created an ROI that selected voxels with pRFs centers within the circumference of big square stimuli: 8.82x8.82°squarelocated 0.59°to 9.41°from display center in both x-and y-dimensions in lower left and upper right quadrant.From these voxels, we used those with corresponding data from the SEQ-SIM experiment.
Voxel-wise.For each voxel, we report simultaneous suppression levels, pRF model parameters, cross-validated R^2.In addition, we report the average suppression level per stimulus condition, pRF size, exponent, time constant, semi-saturation constant across voxels within each visual area for each individual participant and across participants.
None. predict voxel's observed time course using six different image-computable, pRF encoding models.Spatial pRF parameters were independently estimated from each participant's retinotopy experiment using the CSS pRF model, resulting in a 2D Gaussian pRF with a center (x0, y0), standard deviation (!) and exponent (n) parameter for each voxel.To predict voxel's responses in the SEQ-SIM experiment, we use the independent spatial pRF parameters to reconstruct LSS and CSS pRFs.The CST pRFs used the spatial pRF parameters from the retinotopy experiment and neural temporal IRFs with fixed parameters based on Stigliani et al. 2017.Only one CST pRF model parameter was optimized: the compressive spatiotemporal static power-law exponent using a grid-fit approach.In addition to the three main pRF models (LSS, CSS, CST), we tested three We require information from authors about some types of materials, experimental systems and methods used in many studies.Here, indicate whether each material, system or method listed is relevant to your study.If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.CSS, compressive spatial summation, and LSS, linear spatial summation).Results are consistent across voxels and individuals.